BSAN2204

Week 1 - Introduction to Methods of Business Analytics

Methods of Business Analytics

BSAN2204 – Methods of Business Analytics is the first course in the business analytics major after the course BSAN2201 – Principles of Business Analytics.

The focus of Methods of Business Analytics is the methods of business analytics (the course name does not lie!) and their applications to solving business problems – preparing students for and equipping them with the skills they need to complete the business analytics major and to pursue careers with the analytics advantage.

Course Staff Team

Dr Thomas Magor (lecturer) and Ms Shannon Lutze (tutor) are your teaching team for BSAN2204. Thomas is a full time academic in marketing with a background in econometrics and consumer choice modelling. Shannon is a current PhD student researching the effects of digital content algorithms on consumer choice.

Course Structure (2L + 2T)

The lecture series with Thomas (me) cover a combination of background theory and “click-throughs” of demonstrative examples in R Studio. There is no required practical participation in lectures, although you are encouraged to “click-along” during these sessions and to ask questions.

The tutorial series with Shannon apply the lecture content through a series of practical exercises. The tutorials require your active hands on keyboard participation. To some extent (although not completely), your completion of the tutorial exercises will provide you with what is needed to complete the first two assessment items.


Tutorials begin next week (Week 2).

Contacting us

We have a Teams page! This is new to us, and it may be new to you, too. If you are unfamiliar with Teams, please visit the UQ Library’s Teams guide for more information.


You can find the BSAN2204 Teams page 👉 HERE 👈.


You can also visit us in-person during our posted consultation times. See the Teams pgae for more information.

What are methods of business analytics?

This course is all about the Methods of Business Analytics (the course title does not lie!). Methods include everything from reading data into analysis software, visualising data, making predictions using data and restructuring/cleaning data.

This is course is not a stats course, although it is a highly technical course, with a large part of what we do being writing code, interpreting statistical output and making recommendations for business based on data. You will become literate at reading and writing data!

Methods are applied using tools

This course will equip you with a basic tool kit based in R for addressing the most common sorts data problems you will encounter as a business analyst. This means knowing when/why certain methods are appropriate to use based on the types of a data a business might have.

What is R? R is a programming language and free software environment for statistical computing and data visualisation. It may not necessarily be the platform you will use in your future career, but it is a great platform for learning the basics of data analysis.

R (programming language)

You will develop a familiarity for writing code in the R programming language. No prior experience using R is necessary or expected, nor is any background in statistics or quantitative analysis.

We will start very the very beginning with downloading and installing R, as well as a suitable development environment to use R in (R Studio).

Demonstration of R as a calculator

R can be used like a basic calculator. Basic operations are written out as per below, the and the result displays as plain text below.

# Basic math
2 + 2
[1] 4

Demonstration of R for basic analysis

We will primarily use R for data analysis, which involves reading in data, manipulating it, and performing statistical analyses. The following example uses slightly more involved R script to generate some data which gets assigned to an object later use. In this case, it is 100 observations which follow a normal distribution that have a mean of 170 and standard deviation of 10. The mean() function is then used to calculate the mean of the observations stored in the object called height.

# Example heights data
heights <- rnorm(100, mean = 170, sd = 10)

# Calculate the average height
mean(heights)
[1] 168.389

Demonstration of R for data visulisation

R can also be used to create visualisations of data, such as histograms, scatter plots, and line graphs. For example, we can plot the distribution of values stored in the height object created above:

# Plot a histogram
hist(heights, 
     main = "Distribution of Heights", 
     xlab = "Height (cm)", 
     ylab = "Frequency",
     col = "skyblue")

Demonstration of R’s advanced graphics

R can create powerful and interactive visualisations. The following example uses the plotly package to create a 3D scatter plot of a subset of the Million Song Dataset (MSD) that we will use for the assessment project. The plot below places a song’s tempo on the x-axis, loudness on the y-axis, duration on the z-axis, and song hotness as the colour of the points. This plot is probably not very useful as it is overly complex, but it sure demonstrates the power of R’s graphics capability!

library(plotly)
my_subset <- MSD[1:1000, c("tempo", "loudness", "duration", "song_hotttnesss")]
x_var <- my_subset$tempo
y_var <- my_subset$loudness
z_var <- my_subset$duration
color_var <- my_subset$song_hotttnesss

# Create the 3D scatter plot
plot_ly(data = my_subset,         
    x = ~x_var,         
    y = ~y_var,         
    z = ~z_var,         
    color = ~color_var,         
    colors = colorRamp(c("blue", "red")),         
    type = "scatter3d",         
    mode = "markers") %>%  
   layout(title = "3D Scatter Plot of Million Song Dataset",         
    scene = list(xaxis = list(title = "Tempo"),                      
    yaxis = list(title = "Loudness"),                      
    zaxis = list(title = "Duration")))

R Markdown

R also allows you to create dynamic reports that combine code, output, and narrative text. This is done using R Markdown, which is a format for writing documents that contain both R code and text.

In fact, all the course materials in BSAN2204 are created using R Markdown (including this slide deck!)

All assignments in BSAN2204 will be written using R Markdown and submitted as HTML files.

R Studio

Below is a screenshot taken from the R Studio IDE (Integrated Development Environment) showing how these lecture slides were created. Specifically, R Markdown syntax was used to author a Quartro document which has been rendered into the slide deck you are viewing right now as a HTML file.


The are several advantages to authoring business analytics reports and documents in this way. The first is that your R code is embedded in the document, so you do not need to manually create tables or copy and paste charts and figures. The second is that the document can be easily updated by simply re-running the code, which is particularly useful for reports that need to be generated regularly.

For presentations like this one, we can embed R examples right into the slides, which allows us to demonstrate concepts in real-time. As the output is formatted as a HTML file, it allows to make slides scrollable as well as interactive!

What are != methods?

Methods are the techniques and approaches that you use to analyse data, such as regression analysis, clustering, or time series analysis.

Things like formulating a business analytics strategy, policy setting using analytics or the more social/ethical dimensions of using data != methods of business analytics. These are all important aspects of being an effective business analyist, but these are not the focus of this course.






!= is the R operator for “not equal to”, so this is a cheeky way of saying that these topics are not the focus of this course.

Setting the scene: Money Ball Example

In Moneyball the value of analytics as a way to improve decision making to achieve a desired outcome is really brought to life. The methods in forecasting sporting outcomes are the same as to those used in business and a wide variety of other fields.

Analytics can significantly improve organsiational outcomes across all aspects of business – including accounting/finance, human resources, operations, marketing, social media, logistics/supply chain management just to name a few… all areas of business are impacted by analytics!

Moneyball (Video, 4mins7s)

Video link: https://www.youtube.com/watch?v=Tzin1DgexlE

Miller, B. (Director). (2011). Moneyball [Film]. Copyright: Columbia Pictures.

Course Aims

The primary aim of the course is to provide students with an overview of the methods of business analytics and to prepare students to successfully enter and complete the subsequent courses in the business analytics major

We will be using the R statistical programming language in a very hands on way within the R Studio integrated development environment (IDE) to conduct all our analysis.

Learning Objectives

After successfully completing this course, you should be able to:

  1. Recognise and explain the role of R for business analytics,

  2. Explain the key concepts used in R for managing and manipulating data,

  3. Apply R for basic business analytics tasks, including data visualization and predictive analytics/forecasting,

  4. Compare and critically evaluate competing methods of business analytics, and

  5. Demonstrate how business analytics can inform and improve managerial decision making.

Course Plan

Assessments

Title Due Weighting Format
Data Exploration Report (A1) Week 7 30% *.html
Predictive Analytics Report (A2) Week 11 40% *.html
Practical Demonstration (A3) Exam Period 30% *.html

All three assessments are to be prepared using R Markdown and submitted as rendered *.html files.

The first two assessments (A1 and A2) will focus on analysing the “Million Song Dataset” (MSD). The final assessment (A3) will focus on analysing a dataset provided during the exam period.

Data Exploration Report (A1)

The Data Exploration Report (A1) will document the process of using R to explore a subset of the Million Song Dataset (MSD) using descriptive and visual methods of analysis. The descriptive methods should include measures of central tendency for continuous variables, measures of counts/proportions for categorial variables and bivariate statistics. The visual methods should include univariate and bivariate graphs.

Students must echo all R code used to generate their report and provide concise plain-text commentary explaining the R syntax and functions used (i.e., all code related to reading, processing and structuring the dataset, and code related to descriptive and visual methods of analysis). Additionally, the plain-text sections should further highlight the key insights that emerge from the descriptive and visual methods of analysis.

The Data Exploration Report (A1) must be written using R Markdown and submitted as a rendered HTML file (*.html).

The marking rubric will include criteria to assess the quality of the analysis, document presentation and adherence to submission guidelines.

A full briefing of the Data Exploration Report (A1) will be provided in class during lectures and tutorials which will outline the suggested document structure, length requirements, and submission guidelines.

Predictive Analytics Report (A2)

The Predictive Analytics Report (A2) will extend the analysis presented in the Data Exploration Report (A1). The predictive analytics will comprise using a multiple linear regression model to estimate the drivers of song hotness within a subset of the Million Song Dataset (MSD). Attempts should be made to improve the model by dealing with missing data, nonlinearity, and interactions amongst the input variables. The report should conclude with an evaluation of the model, providing a critical assessment of its usefulness for prediction using model validation.

Students must echo selected R code, limited only to the code relevant to the methods predictive analytics processes (model fitting, handling missing data, evaluating model performance). For these code sections, students must provide concise plain-text commentary explaining the analytical rationale and interpretation of results (i.e., not the R code syntax or functions used). Code related to reading, processing and structuring the dataset should not be echoed to the rendered Predictive Analytics Report (A2).

The Predictive Analytics Report (A2) must be written using R Markdown and submitted as a rendered HTML file (*.html).

The marking rubric will include criteria to assess the quality of the analysis, document presentation and adherence to submission guidelines.

A full briefing of the Predictive Analytics Report (A2) will be provided in class during lectures and tutorials which will outline the suggested document structure, length requirements, and submission guidelines.

Practical Demonstration (A3)

The Practical Demonstration (A3) is an observed form of assessment. Students will receive a dataset and a briefing written in the form of an email from a hypothetical business analytics colleague/client asking for “request for a quick data summary and analysis”. The task requires students to produce a minimally formatted report using R Markdown under observation that includes descriptive statistics, visualisations, a regression model with concise interpretations using the data provided. Students must complete their response to the client briefing under observation within the time constraint of an examination setting.

To mirror the tools and challenges of a professional work setting, students will be permitted to access the Internet and use AI tools during the assessment time to assist in the completion of the assessment. This assessment evaluates practical skills related to data exploration and predictive analytics, adaptability, and proficiency in using real-world tools under time constraints.

A full briefing of the Practical Demonstration (A3) will be provided in class during lectures and tutorials which will outline the suggested response structure, length, and submission guidelines. The Week 12 and 13 lectures and tutorial will cover a review of the topics relevant to the Practical Demonstration (A3).

This Practical Demonstration (A3) must be written using R Markdown and submitted as a rendered HTML file (*.html) before the end of the examination time.

The marking rubric will include criteria to assess the quality of the analysis, report presentation and adherence to submission guidelines.

The Million Song Dataset (MSD)

For the Data Exploration Report (A1) and the Predictive Analytics Report (A2) you will analyse a dataset called the Million Song Dataset (MSD). This dataset includes metadata on 1 million songs, including their musical features, titles, year of release as well as “hotness” scores for each song.

Exploring, describing and visualising the contents of the MSD is the focus of the Data Exploration Report (A1), whilst forecasting what makes a song “hot” is the focus of the predictive analytics report (A2).

There is no specific business problem to answer or contextualisation expected for the two reports.

Data Exploration Report (A1) - more details

The Data Exploration Report (A1) should include the following section headings. You can start this assessment now by creating a Word or text document with these section headings, which you can later transfer to a R Markdown file.

  1. Introduction

  2. Data processing

  3. Exploratory Analysis

A more detailed briefing of what to include in these sections will be discussed in Week 3.

Introduction (A1)

For the Introduction section of the Data Exploration Report (A1) you will need to write a brief overview of the project. You could re-write the assignment briefing information in your own words. A more advanced approach would include citing some background research about the MSD (where it comes from and what potential uses the dataset has in a commercial context).

Whilst there is no specific business problem to answer or contextualisation expected for this report, having your own developed understanding of what this dataset is will help you appreciate what potential value could be derived from your analysis.

Data Processing (A1)

Data processing involves reading in the MSD, selecting a subset of the MSD for analysis and setting up any necessary data labels. In this section, you should describe and provide rationale for your data processing steps.

Working with a smaller subset of these will enable easier processing of the data, whilst labeling ensures all outputs from your analysis can be meaningfully interpreted. We will cover how to these steps in upcoming lectures.

Exploratory Analysis (A1)

Exploratory analysis refers to the process of summarising data in order to uncover patterns and relationships prior to formal modelling. At this stage, the aim is to understand the available data and develop preliminary insights into how variables might interact.

The Data Exploration Report (A1) needs to include the following descriptive statistics and data visualisations.

Descriptive statistics: measures of central tendency for all continuous variables and measures of counts/proportions for all categorical variables. Advanced reports will also include some carefully curated and justified bivariate statistics.

Data visualisations: The report must include at least one each of the following types of data visualisations: a univariate graph, and a bivariate graph. Advanced reports will make design choices that maximise the impact of their visualisations.

Describing and visualising data in R will be taught in the Week 5 lecture and Week 6 tutorial.

Getting started on the project

The lectures will guide you through each of the methods needed to be used in the two projects, but will not specifically focus on analysising the MSD. The tutorials, however, will focus entirely on applying the lecture content to the MSD.

You are strongly encouraged to keep all your files organised and make it easier to work on the project throughout the semester.

In next weeks lecture we over some best practices for working on and writing a data analytics report in R, focusing what will best set you up for a success with the Data Exploration Report (A1).

Preparing for next week

We will cover R and R Studio installation next week, but you can get ahead by doing these steps ahead of the class which will enable you to more easily follow along with the in-class R code examples which will be demonstrated next week.

Step 1: Install R https://cran.csiro.au/

Step 2: Install RStudio https://www.rstudio.com/products/RStudio/#Desktop

Questions?

If you have any further questions after class, find us on the:



BSAN2204 Teams Page (now live!)

Thank you!